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Abstract A methodology that can generate the optimal coefficients of a numeri- 
cal method with the use of an artificial neural network is presented in this work. 
The network can be designed to produce a finite difference algorithm that solves 
a specific system of ordinary differential equations numerically. The case we are 
examining here concerns an explicit two-stage Runge-Kutta method for the nu- 
merical solution of the two-body problem. Following the implementation of the 
network, the latter is trained to obtain the optimal values for the coefficients of 
the Runge-Kutta method. The comparison of the new method to others that are 
well known in the literature proves its efficiency and demonstrates the capability 
of the network to provide efficient algorithms for specific problems. 

Keywords Feedforward artificial neural networks, Gradient descent, Backprop- 
agation, Initial value problems, Ordinary differential equations, Runge-Kutta 
methods 

1 Introduction 

The literature that involves solving ordinary or partial differential equations with 
the use of artificial neural networks is quite limited, but has grown significantly in 
the past decade. To name a few, Dissanayake et al. used a "universal approxima- 
tor" (see [HIl]) to transform the numerical problem of solving partial differential 
equations to an unconstrained minimization problem of an objective function [3]. 
Meade et al. demonstrated feedforward neural networks that could approximate 
linear and nonlinear ordinary differential equations [I] [5]. Puffer et al. constructed 
cellular neural networks that were able to approximate the solution of various 
partial differential equations [6]. Lagaris et al. introduced a method for the solu- 
tion of initial and boundary value problems, that consists of an invariable part, 
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which satisfies by construction the initial/boundary conditions and an artificial 
neural network, which is trained to satisfy the differential equation _7_. S. He et 
al. used feedforward neural networks to solve a special class of linear first-order 
partial differential equations [8]. Jianyu et al. demonstrated a method for solving 
linear ordinary differential equations based on multiquadric radial basis function 
networks _9 . Ramuhalli et al. proposed an artificial neural network with an em- 
bedded finite-element model, for the solution of partial differential equations [10] . 
Tsoulos et al. demonstrated a hybrid method for the solution of ordinary and par- 
tial differential equations, which employed neural networks, based on the use of 
grammatical evolution, periodically enhanced using a local optimization procedure 

In all the aforementioned cases, the neural networks functioned as direct solvers 
of differential equations. In this work, however, we use the constructed neural net- 
work, not as a direct solver, but as a means to generate proper Runge-Kutta 
coefficients. In this aspect, there has been little relevant published material. For 
instance in [12] , Tsitouras constructs a multilayer feedforward neural network that 
uses input data associated with an initial value problem and trains the network to 
produce the solution of this problem as output data, thus generating (by construc- 
tion of the network) coefficients for a predefined number of Runge-Kutta stages. 
Unfortunately, the details of the implementation are not described in great depth. 

In this work we construct an artificial neural network that can generate the co- 
efficients of two-stage Runge-Kutta methods. We consider the two-body problem, 
which is a typical case of an initial value problem where Runge-Kutta methods 
apply, and therefore the resulting method is specialized in solving it. The com- 
parison shows that the new method is more efficient than the classical methods 
and thus proves the capability of the constructed neural network to create new 
Runge-Kutta methods. 

The structure is as follows: in Sections [2] and [3] we present the basic theory for 
Runge-Kutta methods and Artificial Neural Networks respectively. In Section [4] 
we present the implementation of the neural network that applies to a two-stage 
Runge-Kutta method which solves the two-body problem numerically, while in 
Section [5] the derivation of the method is provided. In Section [6] we demonstrate 
the final results along with the comparison of the new method to other well-known 
methods and finally in Section [7] we reach some conclusions about this work. 

2 Runge-Kutta methods 

We consider a two-stage Runge-Kutta method to solve the first order initial value 
problem 

v'(*)=/(t,v) and v(t ) = v (1) 

At each step of the integration interval, we approximate the solution of the initial 
value problem at t n +i, where t n +i = t n + h. For the two-stage Runge-Kutta 
method, the approximate solution v n+ i is given by 



where 



v n +l = v„ + h (61 ki + b 2 k 2 ) 



(2) 
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ki=/(i n ,v„) (3) 

k2 = f(tn + C2 h, V n + h 0,21 ki ) (4) 

The coefficients for this set of methods can be presented by the Butcher tableau 
given below 



C'2 


0,21 




bi b 2 



An explicit two-stage Runge-Kutta method can be of second algebraic order at 
most (see [13]). In order for that to hold, the following conditions must be satisfied 

h + b 2 = 1 (5) 

b 2 <=2 = ^ ( 6 ) 

The extra condition that needs to be satisfied in order for the method to be 
consistent is 

0-21 = C 2 (7) 

3 Artificial Neural Networks 

An artificial neural network (ANN) is a network of interconnected artificial 
processing elements (called neurons) that co-operate with each other in order to 
solve specific problems. ANNs are inspired by the structure and functional aspects 
of biological nervous systems and therefore present a resemblance. Haykin in |14j 
defines an artificial neural network as "a massively parallel distributed processor 
that has a natural propensity for storing experiential knowledge and making it 
available for use". ANNs, similarly to brains, acquire knowledge through a learning 
process, which is called training. That knowledge is stored in the form of synaptic 
weights, whose values express the strength of the connection between two neurons. 

There are many types of artificial neural networks, depending on the structure 
and the means of training. An ANN, in its simplest form, consists of a single 
neuron, in what we call the Perceptron model. The Perceptron is connected with 
a number of inputs: xi, X2, x n . A weight corresponds to each of the neuron's 
connections with the inputs and expresses the specific input's significance to the 
calculation of the Perceptron's output. Therefore, there is a number of weights 
equal to the number of inputs, with being the weight that corresponds to input 
Xi, and WiXi being the combined weighted input. The neuron is also connected with 
a weight, which is not connected with any input and is called bias (symbolized 
by b). The weighted input data are essentially being mapped to an output value, 
through the transfer function of the neuron. The transfer function consists of the 
net input function and the activation function. The first function undertakes 
the task of combining the various weighted inputs into a scalar value. Typically, 
the summation function is used for this purpose, thus the net input in that case 
is described by the following expression 
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n 

u = WjXj + b (8) 

For ease of use, the bias can be treated as any other weight and be represented 
as wo, with xq = 1, so the previous expression becomes 

n 

u = '^2 w i x i ( 9 ) 

The activation function receives the output u of the net input function and 
produces its own output y (also known as the activation value of the neuron), 
which is a scalar value as well. The general form of the activation function is 

y = /(«) (io) 

What is required of a neural network, such as the Perceptron, is the ability to 
learn. That means the ability to adjust its weights in order to be able to produce 
a specific set of output data for a specific set of input data. In the supervised 
type of learning (which is used to train the Perceptron and other types of neural 
networks), besides the input data, we provide the network with target data, 
which are the data that the output should ideally converge to, with the given 
set of input. The Perceptron uses an estimation of the error as a measure that 
expresses the divergence between the actual output and the target. In that sense, 
training is essentially the learning process of adjusting the synaptic weights, in 
order to minimize the aforementioned error. To achieve this, a gradient descent 
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algorithm called delta rule is employed, also known as the Least Mean Square 
(LMS) method or the Widrow-Hoff rule, developed by Widrow and Hoff in [15] . 
The training process that uses the delta rule has the following steps: 

1. Assign random values to the weights 

2. Generate the output data for the set of the training input data 

3. Calculate the error E, which is given by a norm of the differences between the 
target and the output data 

4. Adjust the weights according to the following rule 

dE dE du 

Zku, = -77- — =-77—- — =r)8xi (11) 

OWi OU OWi 

Where r\ is a small positive constant, called learning step or learning rate, 
and S (delta error) is equal to the following expression 

S = ~ (12) 
ou 

5. Repeat from step 2, unless the error E is less than a small predefined constant 
e or the maximum number of iterations has been reached 

With each iteration of the learning method, the weights are adjusted in such 
manner as to reduce the error. Notice that the correction Aun of the weights is 
proportional to the negative of J^r, since the desired goal is the minimization of 
the error. 

Strictly speaking, the Perceptron is not technically a network, since it consists 
of a single neuron. An actual network and the most common form of ANNs is 
the multilayer Perceptron (MLP), which falls in the general category of feed- 
forward neural networks (FFNNs). Feedforward are the neural networks that 
contain no feedback connections, i.e. the connections do not form a loop, but are 
instead all directed towards the output of the network. A multilayer Perceptron 
consists of various layers, namely the input layer, the output layer, and one or 
more hidden layers. The hidden layers are not visible externally of the network, 
hence the "hidden" property. Each of the layers consists of one or more neurons, 
except for the input layer, whose input nodes simply function as an entry point 
for the input data. Each layer is connected to the previous and the next layer, 
thus providing a pathway for the data to travel throughout the network. When a 
layer is connected to another, each neuron of the first layer is connected to every 
neuron of the second layer. In this way, the output of one layer becomes the input 
for the next one. Therefore, to calculate the output of the MLP, the data (either 
input data or activation values) are being propagated forward through the neural 
network. 

To train the MLP, the backpropagation method is used. Backpropagation is 
a generalization of the delta rule that is used to train the Perceptron. The method, 
essentially, functions in the same way for each neuron of the MLP, as simple delta 
rule does for the Perceptron. The difference lies in the fact that, since the learning 
procedure requires the calculation of the S error, the neurons that reside inside the 
hidden layers must be provided with the S errors of the neurons of the next layer. 
For this reason, the S errors are first calculated for the neurons of the output layer 
and are propagated backwards (hence the backpropagation term), all across the 
network. The procedure is the following: 
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1. Assign random values to the weights 

2. Generate the activation values of each neuron, starting from the first hidden 
layer and continue until the output layer 

3. Calculate the error E, which is given by a norm of the differences between the 
target and the output data 



4. Calculate the S errors of each neuron of the output layer, according to ( 12 I 

5. Calculate the S errors of each neuron of the previous layer, according to 



M 



6= f'(u)J2 w i 5 i ( 13 ) 



i=l 

Where f(u) is the activation function of the neuron, M is the number of the 
neurons in the next layer, and 6+ is the S error of the ith neuron of the next 
layer 

6. Repeat step 5 for the previous layer, until all 6 errors for all the hidden layers 
have been calculated 



7. Adjust the weights according to (111 



8. Repeat from step 2, unless the error E is less than a small predefined constant 
e or the maximum number of iterations has been reached 



4 Implementation 

We introduce a feedforward neural network that can generate the coefficients of 
two-stage Runge-Kutta methods, specifically designed to apply to the needs of the 
two-body problem. The two-body problem is described by the following system of 
equations 

x " = -^5> y" = -^S' r = y/x 2 +y 2 (14) 
whose analytical solution is given by 



x = cost, y = sini (15) 
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Since a Runge-Kutta method can only solve a system of first order ordinary 
differential equations, v according to the notation of equation |l]), can be given by 

v=[vi,V2,V3,V4] = [x,y,x',y] (16) 

and 

,r=^T^ 2 (17) 
From now on we will be using x, y, x , y , for being more straightforward than 

V!,V 2 ,V 3 ,V4. 

The neural network we have constructed has six inputs: The coordinates of 
the second body: x and y, the derivatives of the coordinates with respect to the 
independent variable t: x , y , the steplength h and a "dummy" input with the 
constant value of 1. The input data consist of a matrix, where each column is a 
vector, intended to be used by the corresponding input of the network. The matrix 
satisfies the following equation 

[x n , y n , x' n , y n , h, l] = [cos t n , sin t n , -sint„, cost,!, h, 1] (18) 

With t n+ ± = t n + h, where n = 1,2, N and N is the number of total steps. 
Similarly, the target matrix satisfies the following equation 

[x n +i, 2M+i, x' n+1 , y'n+i) = [cosf„ +1 , sint n+ i, -sint n +i, cosi„+i] (19) 

The input and the target data are generated through a certain procedure. In 
this procedure, we set a number of parameters, namely the integration interval 
[t s ,t e ], the xq, yo, x' , y' values that correspond to vo as notated in equation ([TJ, 
and the constant steplength h. With the use of the integration interval and the 
steplength, the number of steps N is calculated. For each step, the process gener- 
ates the input and target data that correspond to the starting and ending point 
of the integration step respectively. Thus, two matrices are generated by using the 
theoretical solution of the problem and consist of x,y,x',y' at the beginning and 
at the end of each integration step. 

Apart from the input layer, the network comprises of thirteen additional layers, 
each consisting of a single neuron. Each layer is connected to one or more other 
layers, always in a feedforward fashion. Most of the connections have a fixed, 
unadjustable weight, whose value is equal to one. Practically, that means that the 
significance of the specific connections does not vary in the progress of the training 
phase. The connections that do have an adjustable weight are those related with 
the three coefficients of the Runge-Kutta method, namely 021, 61, and b 2 . The 
coefficient c 2 is equal to 021, therefore it does not need to be explicitly calculated 
through the use of the neural network. 

The neural network is constructed in such manner as to replicate the actual 
Runge-Kutta methodology and produce the numerical solution for the two-body 
problem for each step of the integration. In contrast to the actual methodology, 
the approximate solution at the end of each step is not used as a starting point 
for the next step, but instead of this, theoretical values are used as starting points 
for each step's calculations. 



/(t,v) = v'(t) = [ui,«2.«3jW4] 



«3, «4, 



VI 



V 2 
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Most of the layers do not use the standard summation function as the net input 
function, but a custom one, different in most of the cases. The activation function 
in every neuron is the identity function, described by the following equation 

y = f(u) = u (20) 

Therefore, the transfer function in each case is, in essence, the net input function. 
Below, there is the list of the inputs and the layers of the network, their input and 
output connections (with any of the 6 inputs or 13 layers) and in parentheses the 
expression that they represent: 
Input 1: 

— Output connections: layers 1 [k\ x i), 2 (k ly i), 6 (k,2 X '), 7 (fey'); 10 (x n +i) 

— Value: x n 

Input 2: 

— Output connections: layers 1 (fe x '), 2 (ki y i), 6 (k 2x >), 7 (k2 y >), 11 (y n +i) 

— Value: y n 

Input 3: 

— Output connections: layers 8 (fez), 10 (xn+i), 12 (x' n+1 ) 
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- Value: x' n = k\ x 
Input 4: 

- Output connections: layers 9 (kiy), H (jm+i), 13 (y' n +\) 

- Value: y' n = k\ y 

Input 5: 

- Output connections: layers 3 (ha,2i), 10 (a; n+ i), 11 (j/„+i), 12 {x' n+1 ), 13 {y' n +\) 

- Value: h 

Input 6: 

- Output connections: layers 4 (b±), 5 (62) 

- Value: 1 

Layer 1: 

- Input connections: inputs 1 (i„), 2 

- Output connections: layers 6 (k2 X >), 7 (fey), 12 (x' n+1 ) 

- Net input function: f x >{x n ,y n ) = - ^J^ y2 y 

- Output value: k\ x i 
Layer 2: 



Input connections: inputs 1 (x n ), 2 (y n ) 

Output connections: layers 6 (k2 X i), 7 (k2 V '), 13 (y' n +i) 

Net input function: f y >(x n ,y n ) = - j 7 " = — 

Output value: k\ y i 



Layer 3: 



Input connections: input 5 (h) 

Output connections: layers 6 (A^'), 7 (fc 2 y')' 8 (fc2x)> 9 (fey) 
Net input function: 
Output value: h 021 



Layer 4: 



- Input connections: input 6 (1) 

- Output connections: layers 10 (x n+ i), 11 (y n +i), 12 (x' n+1 ), 13 

- Net input function: ^ 

- Output value: b\ 

Layer 5: 

- Input connections: input 6 (1) 

- Output connections: layers 10 (x n+ i), 11 (y n +i), 12 (x' n+1 ), 13 (y' n+1 ) 

- Net input function: Y^, 

- Output value: 62 

Layer 6: 

- Input connections: inputs 1 (x n ), 2 (y n ), layers 1 (k lx >), 2 (k ly >), 3 (ha,2i) 

- Output connections: layer 12 (a4 + i) 

- Net input function: f 2x > = f x >(x n + ha 2 \ k lx < , y n + ha 2 i k ly ,) 

- Output value: k 2x > 
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Layer 7: 

- Input connections: inputs 1 (x n ), 2 (y n ), layers 1 (k lx >), 2 (k ly >), 3 {ha 2 \) 

- Output connections: layer 13 (y' n +i) 

- Net input function: f 2y i = f y >(x n + ha 2 i k lx >, y n + ha 2 i k ly i) 

- Output value: k 2y > 

Layer 8: 

- Input connections: input 3 (x' n = ki x ), layer 3 {ha 2 \) 

- Output connections: layer 10 (x n +i) 

- Net input function: f 2 = x' n + ha 2 ± k\ x = x n ■ (1 + ha 2 i) 

- Output value: k 2x 

Layer 9: 

- Input connections: input 4 (y' n = ki y ), layer 3 (ha 2 i) 

- Output connections: layer 11 (y n +i) 

- Net input function: f 2 = y' n + ha 2 ± k\ y = y' n ■ (1 + ha 2 i) 

- Output value: k 2y 

Layer 10: 

- Input connections: inputs 1 (x n ), 3 (x' n = ki x ), 5 (h), layers 4 (61), 5 (62), 8 
(k 2x ) 

- Output connections: network output 

- Net input function: = x n + h ■ (bi k\ x + b 2 k 2x ) 

- Output value: x n +i 

Layer 11: 

- Input connections: inputs 2 (y n ), 4 (y' n = ki y ), 5 (h), layers 4 (61), 5 (62), 9 
(k 2y ) 

- Output connections: network output 

- Net input function: h=y n + h- (b\ ki y + b 2 k 2y ) 

- Output value: y n +i 

Layer 12: 

- Input connections: inputs 3 (x' n = k\ x ), 5 (h), layers 1 (k lx >), 4 (61), 5 (62), 6 

- Output connections: network output 

- Net input function: fs=x' n + h- (61 k lx > + b 2 k 2x i) 

- Output value: x' n+1 

Layer 13: 

- Input connections: inputs 4 (y' n = ki y ), 5 (h), layers 2 (k ly i), 4 (61), 5 (62), 7 
{k 2y ') 

- Output connections: network output 

- Net input function: f3=y' n + h- (61 k ly i + b 2 k 2y >) 

- Output value: y' n+ i 
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5 Method derivation 

The derivation of the coefficients for the RK method described in pj), includes 
certain subroutines: 

— Data generation 

— Neural network training 

— Conversion to fraction 

During the data generation phase, we generate the input and target data that 
are going to be used to train the neural network. We have used various integration 
intervals [i s ,t e ], from [0, 2n] up to [0,2007r] and various steplengths, from h = 5 
down to ■ A s we mentioned before, the input and the target data are described 



by the equations ( 18 1 and ( 19 1 respectively. Therefore, after the data generation, 
we obtain an input matrix of size 6 x N and a target matrix of size 4 x N. Where 
N is the number of steps and the following equation holds 

N= te_U (21) 

During the training phase, we use the generated data to train the neural net- 
work. The method used to conduct the training is a variation of the typical back- 
propagation method, called Gradient Descent with Momentum and Adaptive 
Learning Rate Backpropagation or GDX (see [16] ) . We use this training method 
since it has yielded the best results performance-wise, in comparison with the rest. 
Instead of a standard error estimation function, we use a custom one for this pur- 
pose, which is shown below 

||Output - Target||oo + \bi + b 2 - 1| + |a 2 i &2 - si ( 22 ) 

The error to be minimized is the maximum absolute difference between the output 
and the target data, plus some added expressions to satisfy the algebraic condi- 
tions. The terms \b± + 62 — 1| and \a21b2 — \\ are used to satisfy the conditions 
([5]) and ([6| respectively. The coefficient £121 is used interchangeably with C2, due 
to 0. 

As a result of the training phase, the obtained coefficients 021, b± and 6 2 sat- 
isfy the conditions to a certain degree of accuracy. Additionally, the Runge-Kutta 
method constructed with the generated coefficients can produce an output, suffi- 
ciently close to the target data. 

During the last phase, the coefficient 021 is converted into a fraction to simplify 
the method, with an insignificant loss of accuracy. The rest of the coefficients 
are calculated with the use of the fractional 021, in order for the algebraic and 
consistency conditions, ([5]), ^ and ^ respectively, to be satisfied. The coefficients 
of the new method are presented in Table [T] 



6 Results 

The best method was provided by using the integration interval [0, 2n] and the 
steplength h = y|g . Training with other integration intervals and steplengths have 
returned similar results. 2tt represents a full oscillation for the two-body problem, 
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which in part explains why training over wider intervals does not yield better 
results. Apart from the new method, the training has also generated some well 
known classical methods, some of which are given later in this section. 

We compare the new method with three classical two-stage explicit Runge- 
Kutta methods. The corresponding Butcher tableaus of all methods are shown in 
Tables [QIU 
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Table 1 New method 



1 


1 




1 


1 




2 


2 



Table 2 Heun's method 



1 


1 




2 


a 









1 



Table 3 Midpoint method 



2 2 




Table 4 Third classical method 



A comparison between the classical methods and the new method is presented 
in figure [4] which regards the numerical integration of the two-body problem over 
the interval [0,1000]. The vertical axis represents the accuracy as expressed by 
—logio (maximum absolute error), while the horizontal axis represents the total 
number of function evaluations that are required for the computation. The latter 
is provided by the formula F.E. = s ■ N, where s = 2 and stands for the number of 
stages of the RK method and is the number of total steps. We can see that the 
new method is more efficient among all methods compared. 

7 Conclusions 

We have constructed an artificial neural network that can generate the optimal 
coefficients of an explicit two-stage Runge-Kutta method. The network is specifi- 
cally designed to apply to the needs of the two-body problem, and therefore the 
resulting method is specialized in solving that problem. Following the implemen- 
tation of the network, the latter is trained to obtain the optimal values for the 
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Two-body Problem 



-•-New Method 
Midpoint 
Classical 3 




2.5E+05 2.5E+06 2.5E+07 2.5E+08 

Function Evaluations 



Fig. 4 Efficiency of the compared methods for the integration of the two-body problem 



coefficients of the method. The comparison has shown that the new method devel- 
oped in this work is more efficient than other well known methods and thus proved 
the capability of the constructed neural network to create new efficient numerical 
algorithms. 
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